AMD Athlon logo |
|
Produced | From mid 1999 to 2005 |
---|---|
Common manufacturer(s) |
|
Max. CPU clock rate | 500 MHz to 2.33 GHz |
FSB speeds | 200 MT/s to 400 MT/s |
Min. feature size | 0.25 µm to 0.13 µm |
Instruction set | x86 |
Socket(s) | |
Core name(s) |
|
Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel's competing processors for a significant period of time. The original Athlon also had the distinction of being the first desktop processor to reach speeds of one gigahertz (GHz). AMD has continued using the Athlon name with the Athlon 64, an eighth-generation processor featuring x86-64 (later renamed AMD64) architecture, and the Athlon II.
The Athlon made its debut on June 23, 1999. Athlon is the ancient Greek word for "Champion/trophy of the games".
Contents |
AMD ex-CEO and founder Jerry Sanders developed strategic partnerships during the late 1990s to improve AMD's presence in the PC market based on the success of the AMD K6 architecture. One major partnership announced in 1998 paired AMD with semiconductor giant Motorola.[1] In the announcement, Sanders referred to the partnership as creating a "virtual gorilla" that would enable AMD to compete with Intel on fabrication capacity while limiting AMD's financial outlay for new facilities. This partnership also helped to co-develop copper-based semiconductor technology, which would become a cornerstone of the K7 production process.
In August 1999, AMD released the Athlon (K7) processor. Notably, the design team was led by Dirk Meyer, who had worked as a lead engineer on multiple Alpha microprocessors during his employment at DEC. Jerry Sanders had approached many of the engineering staff to work for AMD as DEC wound down their semiconductor business, and brought in a near-complete team of engineering experts. The balance of the Athlon design team comprised AMD K5 and K6 veterans.
By working with Motorola, AMD was able to refine copper interconnect manufacturing to the production stage about one year before Intel. The revised process permitted 180-nanometer processor production. The accompanying die-shrink resulted in lower power consumption, permitting AMD to increase Athlon clock speeds to the 1 GHz range.[2] Yields on the new process exceeded expectations, permitting AMD to deliver high speed chips in volume in March 2000.
Internally, the Athlon is a fully seventh generation x86 processor, the first of its kind. Like the AMD K5 and K6, the Athlon dynamically buffers internal micro-instructions at runtime resulting from parallel x86 instruction decoding. The CPU is an out-of-order design, again like previous post-5x86 AMD CPUs. The Athlon utilizes the Alpha 21264's EV6 bus architecture with double data rate (DDR) technology. This means that at 100 MHz, the Athlon front side bus actually transfers at a rate similar to a 200 MHz single data rate bus (referred to as 200 MT/s), which was superior to the method used on Intel's Pentium III (with SDR bus speeds of 100 MHz and 133 MHz).
AMD designed the CPU with more robust x86 instruction decoding capabilities than that of K6, to enhance its ability to keep more data in-flight at once. The Athlon's three decoders could potentially decode three x86 instructions to six microinstructions per clock, although this was somewhat unlikely in real-world use.[3] The critical branch predictor unit, essential to keeping the pipeline busy, was enhanced compared to what was on board the K6. Deeper pipelining with more stages allowed higher clock speeds to be attained.[4] Whereas the AMD K6-III+ topped out at 570 MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was capable of clocking much higher.
AMD ended its long-time handicap with floating point x87 performance by designing a super-pipelined, out-of-order, triple-issue floating point unit.[3] Each of its three units was tailored to be able to calculate an optimal type of instructions with some redundancy. By having separate units, it was possible to operate on more than one floating point instruction at once.[3] This FPU was a huge step forward for AMD. While the K6 FPU had looked anemic compared to the Intel P6 FPU, with Athlon this was no longer the case.[5]
The 3DNow! floating point SIMD technology, again present, received some revisions and a name change to "Enhanced 3DNow!". Additions included DSP instructions and an implementation of the extended MMX subset of Intel SSE.[6]
The Athlon's CPU cache consisted of the typical two levels. Athlon was the first x86 processor with a 128 kB[7] split level 1 cache; a 2-way associative, later 16-way, cache separated into 2×64 kB for data and instructions (Harvard architecture).[3] This cache was double the size of K6's already large 2×32 kB cache, and quadruple the size of Pentium II and III's 2×16 kB L1 cache. The initial Athlon (Slot A, later called Athlon Classic) used 512 kB of level 2 cache separate from the CPU, on the processor cartridge board, running at 50% to 33% of core speed. This was done because the 250 nm manufacturing process was too large to allow for on-die cache while maintaining cost-effective die size. Later Athlon CPUs, afforded greater transistor budgets by smaller 180 nm and 130 nm process nodes, moved to on-die L2 cache at full CPU clock speed.
The Athlon, later called Athlon Classic, launched on June 23, 1999 and was generally available in August of that year. It demonstrated superior performance compared to the reigning champion, Intel's Pentium III, in every benchmark.[8]
The Athlon Classic is a cartridge-based processor. The design, called Slot A, was similar to Intel's Slot 1 cartridge used for Pentium II and Pentium III. The mating motherboard receptacle was the same part used with Intel products but keyed differently to prevent installation of the wrong CPU. The cartridge assembly allowed the use of higher speed cache memory than can be put on the motherboard. Like Pentium II and the Katmai-based Pentium III, the Athlon Classic contained 512 kB of L2 cache. This cache, again like its competitors, ran at a fraction of the core clock rate and had its own 64-bit bus, called a "back-side bus", that allowed concurrent system front side bus and cache accesses.[9] Initially, the L2 cache was run at half the CPU clock speed, on Athlon CPUs clocked up to 700 MHz. Faster Slot-A processors were forced to compromise with cache clock speed and ran at 2/5 (up to 850 MHz) or 1/3 (up to 1 GHz).[10] The SRAM available at the time was incapable of matching the Athlon's clock scalability, due both to cache chip technology limitations and electrical/cache latency complications of running an external cache at such a high speed.
The Slot-A Athlons were the first multiplier-locked CPUs from AMD. This was partly done to hinder CPU remarking being done by questionable resellers around the globe. AMD's older CPUs could simply be set to run at whatever clock speed the user chose on the motherboard, making it trivial to relabel a CPU and sell it as a faster grade than it was originally intended. These relabeled CPUs were not always stable, being overclocked and not tested properly, and this was damaging to AMD's reputation. Although the Athlon was multiplier locked, crafty enthusiasts eventually discovered that a connector on the PCB of the cartridge could control the multiplier. Eventually a product called the "Goldfingers device" was created that could unlock the CPU, named after the gold connector pads on the processor board that it attached to.[11]
In commercial terms, the Athlon "Classic" was an enormous success—not just because of its own merits, but also because Intel endured a series of major production, design, and quality control issues at this time. In particular, Intel's transition to the 180 nm production process, starting in late 1999 and running through to mid-2000, suffered delays. There was a shortage of Pentium III parts. In contrast, AMD enjoyed a remarkably smooth process transition and had ample supplies available, causing Athlon sales to become quite strong.
The Argon-based Athlon contained 22 million transistors and measured 184 mm2. It was fabricated by AMD in a slightly modified version of their CS44E process, a 0.25 µm complementary metal–oxide–semiconductor (CMOS) process with six levels of aluminium interconnect.[12][13] "Pluto" and "Orion" Athlons were fabricated in a 0.18 µm process.
The second generation Athlon, the Thunderbird, debuted on June 5, 2000. This version of the Athlon shipped in a more traditional pin-grid array (PGA) format that plugged into a socket ("Socket A") on the motherboard (it also shipped in the slot A package). It was sold at speeds ranging from 600 MHz to 1.4 GHz (Athlon Classics using the Slot A package could clock up to 1 GHz). The major difference, however, was cache design. Just as Intel had done when they replaced the old Katmai-based Pentium III with the much faster Coppermine-based Pentium III, AMD replaced the 512 kB external reduced-speed cache of the Athlon Classic with 256 kB of on-chip, full-speed exclusive cache. As a general rule, more cache improves performance, but faster cache improves it further still.[14]
AMD changed cache design significantly with the Thunderbird core. With the older Athlon CPUs, the CPU caching was of an inclusive design where data from the L1 is duplicated in the L2 cache. Thunderbird moved to an exclusive design where the L1 cache's contents are not duplicated in the L2. This increases total cache size of the processor and effectively makes caching behave as if there is a very large L1 cache with a slower region (the L2) and a very fast region (the L1).[15] Because of Athlon's very large L1 cache and the exclusive design which turns the L2 cache into basically a "victim cache", the need for high L2 performance and size was lessened. AMD kept the 64-bit L2 cache data bus from the older Athlons, as a result, and allowed it to have a relatively high latency. A simpler L2 cache reduced the possibility of the L2 cache causing clock scaling and yield issues. Still, instead of the 2-way associative scheme used in older Athlons, Thunderbird did move to a more efficient 16-way associative layout.[14]
The Thunderbird was AMD's most successful product since the Am386DX-40 ten years earlier. Mainboard designs had improved considerably by this time, and the initial trickle of Athlon mainboard makers had swollen to include every major manufacturer. AMD's new fab in Dresden came online, allowing further production increases, and the process technology was improved by a switch to copper interconnects. In October 2000, the Athlon "C" was introduced, raising the mainboard front side bus speed from 100 MHz to 133 MHz (266 MT/s) and providing roughly 10% extra performance per clock over the "B" model Thunderbird.
AMD released the third-generation Athlon, code-named "Palomino", on October 9, 2001 as the Athlon XP. The "XP" suffix is interpreted to mean eXtreme Performance and also as an unofficial reference to Microsoft Windows XP.[16] The Athlon XP was marketed using a PR system, which compared its relative performance to an Athlon utilizing the earlier "Thunderbird" core. Athlon XP launched at speeds between 1.33 GHz (PR1500+) and 1.53 GHz (PR1800+), giving AMD the x86 performance lead with the 1800+ model. Less than a month later, it enhanced that lead with the release of the 1600 MHz 1900+[17], and subsequent 1.67 GHz Athlon XP 2000+ in January, 2002.
Palomino was the first K7 core to include the full SSE instruction set from the Intel Pentium III, as well as AMD's 3DNow! Professional. It is roughly 10% faster than Thunderbird at the same clock speed, thanks in part to the new SIMD functionality and to several additional improvements. The core has enhancements to the K7's TLB architecture and added a hardware data prefetch mechanism to take better advantage of available memory bandwidth.[18] Palomino was also the first socketed Athlon officially supporting dual processing, with chips certified for that purpose branded as the Athlon MP.[19]
Changes in core layout also resulted in Palomino being more frugal with its electrical demands, consuming approximately 20% less power than its predecessor, and thus reducing heat output comparatively as well.[20] While the preceding Athlon "Thunderbird" was capable of clock speeds exceeding 1400 MHz, the power and thermal considerations required to reach those speeds would have made it increasingly impractical as a marketable product. Thus, Palomino's goals of lowered power consumption (and resultant heat produced) allowed AMD to increase performance within a reasonable power envelope. Palomino's design also allowed AMD to continue using the same 180 nm manufacturing process node and core voltages as Thunderbird.
Interestingly, the Palomino core actually debuted earlier in the mobile market — creatively branded the Mobile Athlon 4 with the codename "Corvette". It distinctively used a ceramic interposer much like the Thunderbird instead of the organic pin grid array package used on all later Palomino processors.[18]
The fourth-generation Athlon Thoroughbred was released on 10 June 2002 at 1.8 GHz (Athlon XP PR2200+). The "Thoroughbred" core marked AMD's first production 130 nm silicon, resulting in a significant reduction in die size compared to its 180 nm predecessor.
There came to be two steppings (revisions) of this core commonly referred to as Tbred-A (cpuid:6 8 0) and Tbred-B (cpuid:6 8 1).[21] The initial version (later known as A) was simply a direct die shrink of the Palomino, and demonstrated that AMD had successfully transitioned to a 130 nm process. While successful in reducing the production cost per processor, the unmodified Palomino design did not demonstrate the expected reduction in heat and clock scalability usually seen when a design is shrunk to a smaller process. As a result, AMD was not able to increase Thoroughbred-A clock speeds much above those of the Palomino it was to replace. Tbred-A was only sold in versions from 1333 MHz to 1800 MHz, and was only able to dis-place the more production-costly Palomino from AMD's lineup.
AMD thus reworked the Thoroughbred's design to better match the process node on which it was produced, in turn creating the Thoroughbred-B. A significant aspect of this redesign was the addition of another ninth "metal layer" to the already quite complex eight-layered Thoroughbred-A. For comparison, the competing Pentium 4 Northwood only utilized six, and its successor Prescott seven layers. While the addition of more layers itself does not improve performance, it gives more flexibility for chip designers routing electrical pathways within a chip, and importantly for the Thoroughbred core, more flexibility in working around electrical bottlenecks that prevented the processor from attaining higher clock speeds. The Tbred-B offered a startling improvement in headroom over the Tbred-A, which made it very popular for overclocking. The Tbred-A often struggled to reach clock speeds above 1.9Ghz, while the Tbred-B often could easily reach 2.3Ghz and above.[22]
The Thoroughbred line received an increased front side bus clock during its lifetime, from 133 MHz (266 MT/s) to 166 MHz (333 MT/s) improving the processor's ability to access memory and I/O efficiency, and resulted in improved per-clock performance. AMD shifted their PR rating scheme accordingly, making lower clock speeds equate to higher PR ratings.
The Thoroughbred-B was the direct basis for its successor — the Tbred-B with an additional 256 kB of L2 cache (for 512 kB total) became the Barton core.
Fifth-generation Athlon Barton-core processors released in early 2003 with PR ratings of 2500+, 2600+, 2800+, 3000+, and 3200+. While not operating at higher clock rates than Thoroughbred-core processors, they were marked with higher PR-ratings by featuring an increased 512 kB L2 cache; later models additionally supported an increased 200 MHz (400 MT/s) front side bus.[23] The Thorton core was a later variant of the Barton with half of the L2 cache disabled, and thus was functionally identical to the Thoroughbred-B core. The name Thorton is a portmanteau of Thoroughbred and Barton.
By the time of Barton's release, the Northwood-based Pentium 4 had become more than competitive with AMD's processors.[24] Unfortunately for AMD, a simple increase in size of the L2 cache to 512 kB did not have nearly the same impact as it did for Intel's Pentium 4 line, as the Athlon architecture was not nearly as cache-constrained as the Pentium 4. The Athlon's exclusive-cache architecture and shorter pipeline made it less sensitive to L2 cache size, and the Barton only saw an increase of several percent gained in per-clock performance over the Throughbred-B it was derived from.[23] While the increased performance was welcome, it was not sufficient to overtake the Pentium 4 line in overall performance. The PR rating also became somewhat inaccurate because some Barton models with lower clock rates were being given higher PR ratings than higher-clocked Thoroughbred processors. Instances where a computational task did not "benefit more" from the additional cache to make up for the loss in raw clock speed created situations where a lower rated (but faster clocked) Thoroughbred would outperform a higher-rated (but lower clocked) Barton.[24]
The Barton was also used to officially introduce a higher 400 MT/s bus clock for the Socket A platform, which was used to gain some Barton models more efficiency (and increased PR ratings). However, it was clear by this time that Intel's quad-pumped bus was scaling well above AMD's double-pumped EV6 bus. The 800 MT/s bus used by many later Pentium 4 processors was well out of the Athlon XP's reach. In order to reach the same bandwidth levels, the Athlon XP's bus would have to be clocked at levels simply unreachable.[23]
By this point, the four year old Athlon EV6 bus architecture had scaled to its limit. To maintain or exceed the performance of Intel's newer processors would require a significant redesign.[23] The K7 derived Athlons were replaced in March 2003 by the Athlon 64 family, which featured an on-chip memory controller and a completely new HyperTransport bus to replace EV6.
Barton (130 nm)
Thorton (130 nm)
Mobile Athlon XPs (Athlon XP-M) are identical to normal Athlon XPs, apart from running at lower voltages, often lower bus speeds, and not being multiplier-locked. The lower Vcore rating caused the CPU to have lower power consumption (ideal for battery-powered laptops) and lower heat production. Athlon XP-M CPUs also have a higher-rated heat tolerance, a requirement of the tight conditions within a notebook PC.
The Athlon XP-M replaced the older Mobile Athlon 4. The Mobile Athlon 4 used the older Palomino core, while the Athlon XP-M used the newer Thoroughbred and Barton cores. Some specialized low-power Athlon XP-Ms utilize the microPGA socket 563 rather than the standard Socket A.
The CPUs, like their mobile K6+ predecessors, were also capable of dynamic clock adjustment for power optimization. When the system is idle, the CPU clocks itself down through a lower bus multiplier and also reduces its voltage. Then, when a program demands more computational resources, the CPU very quickly (there is some latency) returns to intermediate or maximum speed to meet the demand. This technology was marketed as "PowerNow!". It was similar to Intel's SpeedStep power saving technique. The feature was controlled by the CPU, motherboard BIOS, and operating system. AMD later renamed the technology to Cool'n'Quiet on their K8-based CPUs (Athlon 64, etc.), and introduced it for use on desktop PCs as well.
Athlon XP-Ms were popular with desktop overclockers, as well as underclockers. The lower voltage requirement and higher heat rating resulted in CPUs that were basically "cherry picked" from the manufacturing line. Being the best of the cores off the line, the CPUs typically were more reliably overclocked than their desktop-headed counterparts. Also, the fact that they were not locked to a single multiplier was a significant simplification for the overclocking process. Some Barton core Athlon XP-Ms have been successfully overclocked to as high as 3.1 GHz.
As stated, the chips were also liked for their underclocking ability. Underclocking is a process of determining the lowest Vcore at which a CPU can remain stable at for a given clock speed. The Athlon XP-M CPUs were capable of running lower voltages per clock rate compared to their desktop siblings. As such, the chips were used in home theater PC systems due to their high performance and low heat output at low Vcore settings.
Besides not being locked to multiplier, they were also not disabled from SMP operation as were other Athlon XPs. Thus one could use them instead of the more expensive Athlon MP in dual socket A motherboards. Since those boards lacked multiplier and voltage adjustments, and could run only 133 MHz FSB the adjustment could have been made by wire-modding the CPU socket by connecting adjacent CPU pins. It was normal to overclock mobile 2500+ CPU to 2.26 GHz with 17x multiplier, thus being faster than fastest official 2800+ MP CPU running at 2.13 GHz.
The fastest supercomputers based on AthlonMP:
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
|